Parsing a BEL Script with PyBEL is as simple as:
>>> import pybel
>>> graph = pybel.from_url('...')
However, the simple functions exposed at the package-level obscure the caching functionality. In the situation where multiple BEL Scripts would be loaded, the following code would be very slow:
import pybel
my_urls = ['... url 1 ...', '... url 2 ...', ...]
graphs = [
pybel.from_url(url)
for url in my_urls
]
This is because PyBEL takes care of making a connection to a local SQLite cache. It has to build the cache in-memory each time the function is run.
import pybel
from pybel.manager import CacheManager
manager = CacheManager()
my_urls = ['... url 1 ...', '... url 2 ...', ...]
graphs = [
pybel.from_url(url, connection=manager)
for url in my_urls
]
Other common patterns, which includes loading a list of graphs and taking their union, have been implemented in the PyBEL Tools IO Utilities submodule. See: http://pybel-tools.readthedocs.io/en/latest/ioutils.html
The cache manager uses SQLite by default because it requires zero configuration. Better performance can be achieved by switching to using a relational database management system like MySQL or Postgres.
This can be attained by using a RFC-1738 database connection string as the connection
argument to the CacheManager
function
from pybel.manager import CacheManager
connection = 'mysql+pymysql://<username>:<password>@<host>/<dbname>?charset=utf8[&<options>]'
manager = CacheManager(connection=connection)
A default connection string can be set by following the instructions in the documentation at http://pybel.readthedocs.io/en/latest/constants.html#configuration-loading